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SNMPs  are  membrane  proteins  observed  to  associate  with  chemosensory  neurons  in  insects;  in 
Drosophila  melanogaster ,  SNMP1  has  been  shown  to  be  essential  for  the  detection  of  the  pheromone  cis- 
vaccenyl  acetate  (CVA).  SNMPs  are  one  of  three  insect  gene  clades  related  to  the  human  fatty  acid 
transporter  CD36.  We  previously  characterized  the  CD36  gene  family  in  4  insect  Orders  that  effectively 
cover  the  Holometabola,  or  some  80%  of  known  insect  species  and  the  300  million  years  of  evolution 
since  this  lineage  emerged:  Lepidoptera  (e.g.  Bombyx  mori ,  Antheraea  polyphemus,  Manduca  sexta,  Heli- 
othis  virescens,  Helicoverpa  assulta,  Helicoverpa  armigera,  Mamestra  brassicae );  Diptera  (D.  melanogaster, 
Drosophila  pseudoobscura,  Aedes  aegypti,  Anopheles  gambiae,  Culex  pipiens  quinquefasciatus)-,  Hymenop- 
tera  ( Apis  mellifera );  and  Coleoptera  ( Tribolium  castaneum).  This  previous  study  suggested  a  complex 
topography  within  the  SNMP  clade  including  a  strongly  supported  SNMP1  sub-clade  plus  additional 
SNMP  genes.  To  further  resolve  the  SNMP  clade  here,  we  used  cDNA  sequences  of  SNMP1  and  SNMP2 
from  various  Lepidoptera  species,  D.  melanogaster  and  Ae.  aegypti,  as  well  as  BAC  derived  genomic 
sequences  from  Ae.  aegypti  as  models  for  proposing  corrected  sequences  of  orthologues  in  the 
D.  pseudoobscura  and  An.  gambiae  genomes,  and  for  identifying  orthologues  in  the  B.  mori  and  C.  pipiens 
q.  genomes.  We  then  used  these  sequences  to  analyze  the  SNMP  clade  of  the  insect  CD36  gene  family, 
supporting  the  existence  of  two  well  supported  sub-clades,  SNMP1  and  SNMP2,  throughout  the  dipteran 
and  lepidopteran  lineages,  and  plausibly  throughout  the  Holometabola  and  across  a  broad  evolutionary 
time  scale.  We  present  indirect  evidence  based  on  evolutionary  selection  (dN/dS)  that  the  dipteran 
SNMPs  are  expressed  as  functional  proteins.  We  observed  expansions  of  the  SNMP1  sub-clade  in 
C.  pipiens  q.  and  T.  castaneum  suggesting  that  the  SNMPls  may  have  an  expanded  functional  role  in  these 
species. 

©  2009  Elsevier  Ltd.  All  rights  reserved. 


1.  Introduction 

SNMPs  are  insect  membrane  proteins  which  associate  with 
pheromone  sensitive  neurons  in  Lepidoptera  and  Diptera  (Rogers 
et  alM  1997, 2001a, b;  Vogt,  2003;  Benton  et  al.,  2007;  Forstner  et  alM 
2008;  Jin  et  al.,  2008).  SNMPs  comprise  a  sub-clade  of  insect  genes 
related  to  the  human  protein  fatty-acid  transport  protein  CD36 
(Fig.  1A;  Nichols  and  Vogt,  2008).  The  first  SNMPs  were  identified 
from  Lepidoptera  (Rogers  et  al.,  1997,  2001a, b;  Krieger  et  al.,  2002; 
Forstner  et  al.,  2008).  SNMPs  from  several  lepidopteran  species, 
here  referred  to  as  SNMP1,  were  shown  to  be  antenna  specific, 
associating  with  pheromone  specific  olfactory  neurons  in  a  manner 
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suggesting  they  play  a  central  role  in  pheromone  detection  (Rogers 
et  al.,  1997,  2001  a, b;  Krieger  et  al.,  2002).  A  second  lepidopteran 
SNMP,  SNMP2,  also  associates  with  pheromone  sensitive  sensilla 
but  has  been  shown  to  express  in  sensilla  support  cells  rather  than 
neurons  (Rogers  et  al.,  2001b;  Forstner  et  al.,  2008).  Recently, 
a  Drosophila  melanogaster  SNMP1  (SNMP1  Dmel,  cg7000)  was  found 
to  be  essential  for  the  detection  of  the  pheromone  cis-v accenyl 
acetate  (CVA)  (Benton  et  al.,  2001 ;  Jin  et  al.,  2008);  this  protein  was 
not  only  expressed  in  antennae,  but  also  in  other  body  parts  such  as 
legs  and  wings  (Benton  et  al.,  2007;  Miller  et  al.,  2007). 

We  recently  surveyed  the  insect  CD36/SNMP  gene  family  from 
the  genomes  of  6  insect  species:  the  fruitflies  (Diptera)  D.  mela¬ 
nogaster  and  Drosophila  pseudoobscura-,  the  mosquitoes  (Diptera) 
Anopheles  gambiae  and  Aedes  aegypti ;  the  honeybee  (Hymenoptera) 
Apis  mellifera,  and  the  beetle  (Coleoptera)  Tribolium  castaneum 
(Nichols  and  Vogt,  2008).  This  study  suggested  that  the  SNMPs 
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A 

Insect  (Holometabola)  CD36  Gene  family 

Clade  3  -  SNMPs 


B  Holometabola  Phylogeny  (11  Orders) 

Bombyx  mori  (Lepidoptera)  W 
Drosophila  melanogaster  (Diptera)  m 
Drosophila  pseudoobscura  (Diptera) 

Anopheles  gambiae  (Diptera) 

Aedes  aegypti  (Diptera)  "fk 
Culex  pipiens  quinquefasciatus  (Diptera)  YK 
Apis  mellifera  (Hymenoptera) 

Tribolium  castaneum  (Coleoptera)JHl 

Fig.  1.  A.  Neighbor  joining  tree  of  insect  CD36  homoiogues  (MEGA4,  pairwise  gap  deletion  due  to  sequence  divergence).  All  non-SNMP  sequences  used  were  identical  to  those 
reported  in  Nichols  and  Vogt  (2008),  where  they  are  more  fully  described;  SNMP  sequences  (Clade  3)  are  noted  in  Supplementary  Materials  (Table  2),  and  their  relationships  are 
shown  in  more  detail  in  Fig.  4.  emp,  crq,  peste,  ninaD,  santa  maria  refer  to  characterized  D.  melanogaster  genes  (see  Nichols  and  Vogt,  2008).  Bootstrap  support  (1000  replicates)  is 
indicated  for  the  major  clades.  B.  Phylogeny  of  holometabolous  lineages  and  species  used  in  this  study.  Numbers  indicate  millions  of  years  (Mya)  since  indicated  lineages  diverged 
(see  text). 


comprised  one  of  three  major  clades  of  the  insect  CD36  gene  family; 
the  SNMP  clade  showed  a  complex  topography  with  a  well 
supported  SNMP1  sub-clade  plus  additional  SNMP  genes.  To  resolve 
this  topography,  our  current  study  focuses  on  the  SNMP  clade  from 
the  dipteran  and  lepidopteran  genomes,  adding  the  SNMP  genes  of 
the  mosquito  Culex  pipiens  quinquefasciatus  and  the  silk  moth 
Bombyx  mori.  These  genes  are  compared  with  homoiogues  from 
Hymenoptera  and  Coleoptera  affording  a  survey  of  the  majority  of 
the  holometabolous  insect  lineage  (Fig.  IB),  which  includes  at  least 
80%  of  all  known  insect  species.  The  Holometabola  are  thought  to 
have  emerged  around  300  +  million  years  ago  (Mya),  and  the 
Lepidoptera/Trichoptera  and  Diptera  to  have  diverged  around  290 
Mya  (Grimaldi  and  Engel,  2005).  The  Drosophila  and  mosquito 


lineages  are  thought  to  have  diverged  240  Mya  (Grimaldi  and  Engel, 

2005) .  The  D.  melanogaster  and  D.  pseudoobscura  lineages  diverged 
65-43  Mya  (O’Grady,  1999;  Tamura  et  al.,  2004),  and  are  among  the 
most  widely  diverged  genomes  available  for  this  genus.  Mosquitoes 
(Culicidae)  are  comprised  of  three  suborders,  two  of  which  include 
the  blood  feeding  genera:  Culicinae  (including  Aedes  sp.  and  Culex 
sp.)  and  Anophelinae  (including  Anopheles  sp.).  The  Culicinae  and 
Anophelinae  lineages  diverged  200-145  Mya  (Krzywinski  et  al., 

2006)  while  the  Aedes  and  Culex  lineages  diverged  52-24  Mya 
(Foley  et  al.,  1998). 

Our  previous  characterization  of  the  insect  CD36  gene  family  relied 
primarily  on  the  annotated  sequences  provided  in  the  genome  data¬ 
bases  (Nichols  and  Vogt,  2008);  many  of  these  annotations  were 
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truncated  or  otherwise  missing  elements.  For  the  current  study,  we 
used  cDNA  sequences  of  SNMP1  and  SNMP2  from  D.  melanogaster  and 
A.  aegypti,  as  well  as  BAC  (Bacterial  Artificial  Chromosome)  derived 
genomic  sequences  from  A  aegypti,  as  models  for  proposing  corrected 
sequences  of  orthologues  in  D.  pseudoobscura  and  A.  gambiae  genomes 
and  for  identifying  orthologues  in  the  Culex  pipiens  q.  genome.  We 
similarly  used  published  Lepidoptera  SNMP1  and  SNMP2  cDNA 
sequences  to  identify  corresponding  genes  from  the  B.  mori  genome. 
We  then  used  these  revised  sequences  to  reanalyze  the  SNMP  clade, 
demonstrating  the  existence  of  two  well  supported  sub-clades, 
SNMP1  and  SNMP2,  throughout  the  dipteran  and  lepidopteran  line¬ 
ages,  and  plausibly  throughout  the  Holometabola. 

2.  Methods 

2.1.  Animals  and  tissue  collection 

D.  melanogaster  (W1118)  used  in  this  study  were  reared  on 
a  standard  diet  (20  °C,  16h:8h  L:D);  3-5  day  old  adults  were 
collected,  frozen,  lyophilized,  and  processed  as  described  below.  A. 
aegypti  eggs  were  graciously  provided  by  Mark  Brown  (University 
of  Georgia,  Department  of  Entomology)  and  raised  on  a  larval  diet 
(pond  fish  food)  at  27  °C  (16h:8h  L:D).  Newly  ecdysed  pupae  were 
transferred  with  water  to  small  cups  in  cages  (BioQuip  Products, 
#1450B)  for  adult  emergence;  adults  were  provided  with  sugar 
water  (20%  sucrose)  via  cotton  wicks.  For  adult  collection,  cages 
containing  7-day-old  adults  (and  otherwise  empty)  were  lowered 
into  a  -70  °C  chest  freezer  and  flash  frozen;  bodies  were  collected 
from  the  cage  floors  still  frozen  and  subsequently  dried  by  lyoph- 
ilization.  Lyophilized  body  parts  (D.  melanogaster  or  A.  aegypti) 
were  dissected  under  a  stereo  microscope  at  room  temperature  and 
collected  in  vials  (1.5  ml)  containing  95%  ethanol;  the  ethanol  was 
subsequently  removed  by  pipette,  with  the  final  residue  removed 
under  vacuum  (SpeedVac).  Dried  tissues  were  processed  as 
described  below  for  mRNA  isolation. 

2.2.  cDNA  and  genomic  sequencing  of  D.  melanogaster  and 
A.  aegypti  SNMPs 

PCR  primers  and  sequence  information  including  tissue  sources 
are  listed  in  Supplementary  Materials  (Tables  1  and  2).  cDNA  and 
genomic  sequences  obtained  through  this  project  and  submitted  to 
GenBank  include  partial  sequences  of  SNMP1  Dmel  derived  from 
various  tissues  (EF596938  head,  EF560171  wing,  EF560170  leg),  cDNA 
sequences  of  SNMP2Dmd  (EU189152),  SNMPlAaeg  (EU246941) 
and  SNMP2Aaeg  (EU189151),  BAC  derived  genomic  sequences  of 
snmplAaeg  (FJ387158)  and  snmp2Aaeg  (FJ387159),  and  sequences  of 
A.  aegypti  BAC  clones  BL7I-114  (FJ387160,  including  snmplAaeg)  and 
BL14K-12  (FJ387161,  including  snmp2Aaeg). 

For  SNMP  expression  studies  (Fig.  2),  mRNA  was  isolated  from 
tissues  collected  and  pooled  from  50  individuals  (mixed  gender 
D.  melanogaster  or  male  or  female  A.  aegypti)-,  equivalent  aliquots  of 
mRNA  were  processed  for  cDNA  synthesis.  Control  target  sequences 
(Fig.  2)  were  ribosomal  protein  L32  (RpL32,  D.  melanogaster, 
href  =  “genbank:NM_079843”>NM_079843)  or  ribosomal  protein 
S17  (RpS17,  A.  aegypti,  AY927787).  Non-quantitative  PCR  reactions 
were  carried  out  through  30  cycles  (see  below). 

Messenger  RNA  was  isolated  using  TRIzol®  LS  Reagent  (invi- 
trogen)  or  by  acid-phenol  extraction  (Chomczynski  and  Sacchi,  1987). 
cDNA  was  synthesized  (Superscript®  III  RT,  invitrogen)  and  amplified 
by  PCR  (Platinum®  Taq  DNA  Polymerase,  regular  for  tissue  specificity 
studies  or  “high  fidelity”  for  cloning;  invitrogen).  PCR  products  were 
gel  purified  as  needed  (Geneclean  Turbo®,  Q-BIOgene),  and  inserted 
into  plasmid  vectors  for  subsequent  analysis  TOPO®  TA  Cloning  Dual 
Promoter  Kit,  invitrogen.  Cloned  PCR  products  were  sequenced  by 


D.  mel  (m/f)  Ae.  aeg.  (f)  Ae.  aeg.  (m) 

HLW  ALW  A  LW 


Fig.  2.  SNMP1,  SNMP2  and  control  PCR  products  were  amplified  (non-quantitatively) 
from  cDNAs  of  D.  melanogaster  and  A.  aegypti  tissues  using  primers  listed  in  Supple¬ 
mentary  Materials  (Table  1 ).  For  D.  melanogaster,  male  and  female  (m/f)  tissues  were 
combined;  for  each  tissue,  SNMP2Dme/  and  control  (RpL32)  products  were  generated 
under  identical  reaction  conditions,  except  for  primers;  SNMP1  Dmel  products  were  from 
a  separate  experiment  and  control  products  are  not  shown.  For  A.  aegypti,  male  (m)  and 
female  (f)  tissues  were  analyzed  separately;  for  each  tissue,  SNMPlAaeg,  SNMP2Aaeg  and 
control  (RpS17)  products  were  generated  under  identical  reaction  conditions,  except  for 
primers.  H,  head  with  antennae;  A,  antennae;  L,  legs;  W,  wings. 


cycle  sequencing  by  the  University  of  South  Carolina  EnGenCore 
(Joe  Jones,  Director).  Near  full  length  sequences  of  SNMP2Dme/, 
SNMPlAaeg  and  SNMP2Aaeg  were  obtained  from  cDNAs  derived 
from  various  tissues,  and  submitted  to  GenBank.  Partial  sequences 
of  SNMP1  Dmel  from  head,  wing  and  leg  were  also  obtained  and 
submitted  to  GenBank. 

The  coding  and  upstream  regions  of  the  snmplAaeg  and 
snmp2Aaeg  genes  were  obtained  by  sequencing  two  clones  isolated 
from  an  A.  aegypti  BAC  genomic  library  (Shizuya  et  al.,  1992;  Jimenez 
et  al.,  2004;  Lobo  et  al.,  2007);  the  library  was  screened  and  clones 
were  generously  provided  by  Dave  Severson  and  Becky  deBruyn 
(Notre  Dame  University).  BAC  clones  were  sequenced  using  454 
technology  by  the  University  of  South  Carolina  EnGenCore.  See 
Supplementary  Materials  (Methods)  for  details. 

2.3.  Annotation  of  SNMP  genes  from  available  genomes 

Many  of  the  SNMP  sequences  characterized  in  this  study  were 
previously  identified  (see  Nichols  and  Vogt,  2008);  new  sequences 
include  those  from  Culex  pipiens  q.  and  the  snmp2  gene  of  B.  mori. 
For  the  previous  study,  sequences  were  accepted  as  presented  in 
the  published  annotations.  For  the  current  study,  cDNA  sequences 
from  D.  melanogaster,  A.  aegypti  and  Manduca  sexta  were  used 
as  models  to  modify  these  annotations  for  detailed  analysis; 
annotations,  modifications  and  sequences  are  described  in  our 
Supplementary  Materials  (Methods,  Tables  2  and  3).  All  cDNA 
sequences  used  in  this  study  (e.g.  Fig.  4)  are  listed  in  Supplementary 
Materials  (Sequence  Data). 

2.4.  Alignments,  trees,  percent  identity,  dN/dS 

ClustalX  vl.81  (Larkin  et  al.,  2007)  was  used  for  amino  acid 
alignments,  using  default  alignment  parameters.  Mega  4  (Tamura 
et  al.,  2007)  was  used  to  construct  Neighbor-joining  trees  (1000 
bootstrap  replicates,  nodes  collapsed  to  50%  bootstrap  support)  and 
to  derive  percent  identities  and  synonymous  and  non-synonymous 
(dS,  dN)  values  (Nei-Gojobori  model  with  Jukes-Cantor  correction). 
For  dS  and  dN  calculations,  nucleotide  sequences  were  aligned  to 
their  corresponding  codon  (amino  acid)  alignments,  non-over- 
lapping  ends  were  trimmed,  any  remaining  start  and  stop  codons 
were  removed  and  codons  split  by  introns  were  removed. 

3.  Results 

Complementary  DNAs  encoding  near-full  length  SNMP2  of  D. 
melanogaster  (SNMP2 Dmel)  and  SNMP1  and  SNMP2  of  A.  aegypti 
(SNMPlAaeg  and  SNMP2AaegJ  were  obtained  by  PCR,  using  primers 
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A 

SNMPIDmel 

SNMPIDpse 

SNMPlbCpip 

SNMPlcCpip 

SNMPlAgam 

SNMPlAaeg 

SNMPlaCpip 

SNMPIBmor 

SNMPIMsex 


SNMPIDmel 

SNMPIDpse 

SNMPlbCpip 

SNMPlcCpip 

SNMPlAgam 

SNMPlAaeg 

SNMPlaCpip 

SNMPIBmor 

SNMPIMsex 


SNMPIDmel 

SNMPIDpse 

SNMPlbCpip 

SNMPlcCpip 

SNMPlAgam 

SNMPlAaeg 

SNMPlaCpip 

SNMPIBmor 

SNMPIMsex 


SNMPIDmel 

SNMPIDpse 

SNMPlbCpip 

SNMPlcCpip 

SNMPlAgam 

SNMPlAaeg 

SNMPlaCpip 

SNMPIBmor 

SNMPIMsex 


SNMPIDmel 

SNMPIDpse 

SNMPlbCpip 

SNMPlcCpip 

SNMPlAgam 

SNMPlAaeg 

SNMPlaCpip 

SNMPIBmor 

SNMPIMsex 


- MQVPRVKLLMGSGAMFVFAIIYGWVIFPKILKFMISKQVTLKPGSDVRELWSNTPFPLHFYIYVFNVTNPDEVSEGAKPRLQEVGPFVF 

- MKLDRMKLLFVSAGTLVFAILFGWVMFPKILKFMISKQVTLKPGTDVRELWSNTPFPLHFYFYVFNVTNPEDVSQGGRPRLQEVGPFVF 

- MKLEELNFKKIAIICACTLVGGLTFCYGIFPPILKFMLK2NVLLKPGTQMRGMFEKMPFPLDFKIHLFNVSNPEEIMKGGKPKIKDVGPYYF 

- ftNVLLKPGTQMRWMFEKIPFPLDFKIHLFNVTNPDEVMKGGKPKIRDVGPYYF 

- MELKERNFKKIGLICVAVLLCGMVFSYGIFPSILRFMIK2NVLLKPGTQIRDMFEKIPFPLDFKLHIFNVTNPDEIMRGGKPRVNDIGPLYF 

- MLIKNRKNLMLKPGTQMRGMFEKIPFPLDFKLYLFHVTNPDWMKGGKPRVREIGPYFF 

- MWGLTVSMAILPELVNLMLR2NLRLKPGSDLRKMYEKVPFGLDFKVHIFNITNPQEIMQGGRPRVKDIGPFYF 

- MQLAKPLKYAAISGIVAFVGLMFGWVIFPAILKSQLKKEMALSKKTDVRKMWEKIPFALDFKIYLFNYTNAEDVQKGAVPIVKEVGPFYF 

- MRLARGIKYAVIGAGVALFGVLFGWVMFPAILKSQLKKEMALSKKTDVRKMWEKIPFALDFKIYLFNYTNPEEVQKGAAPIVKEVGPYYFEEWKEKVEIEDHEEDDTITYRKMDTFYFRPELSGPGLTGEE 

a  b 


)  EWKDKYDLEDDWEDTVSFTMRNTFI FNPKESLP  -  LTGEE 
)|eWKDKIDLVDDWEDS VTFTMRNTF I FNAEASYP  -  LTGEE 
IWKEKFDTEDDLEEDTLS  FTLRNTWKFRPDLS  S  P - LTGDE 
IWKEKFDTEDDLEEDTLS  FTLRNTWI FRPD I SAP - LTGDE 
IWKEKYDTVDNVEEDTLTFTLRNTWIFRPDLSA- -LTGEE 
IWKEKYDTVDNEEDDTLTFTLKNTWIFRPDLSKP- LTGDE 
IWKEKYDI EDNDGEDTMTFDMKNTWI FRPDLTAP - LTGNE 
IWKEKVEVEENEGNDTINYKKIDVFLFKPELSGPGLTGEE 


EIILPHPIMEPGGISVQREKAAMMELVS - KGLSIVF-PDAKAFLKAKFMDLFFRGINVDCSSEEFSAKALCTVFYTGEIKQAKQVNQ-  - 

TITLPHPIMyPGGITVQRERAAMMELIA - KAMSLVF-PGAKAFLSAPFMDLFFRGIDVDCSPDDFAAKALCTVFYTGEVKQAKQVNQ-  - 

MITLPNMLLMGVFLMVQREREAMMPLIR - KGAKI I FDPLESAFMTVRVMDFLFDGLPVDCSSQDFASKALCSGMES  -  -  EKWLPLND  -  - 

MITVPHLLVLGVFLSVQRDREEMMPLIS - SGMKIIFDPLESAFMTVRVMDLLFDGIPVDCGSEEFAAKAVCSGMES-  -EGAVAPLNE-  - 

IVTIPHPLIMGVLLMVQRDREAMMPLVK - KGVNILFDPLESAFLKVRIMDLLFDGIYVDCSSQDFAAKALCSGMDS -  -EGAVMPHNE-  - 

MITIPHPLILGALLMVQRDREAMMPLVS - KGMDIIMNPLTTGFLTTRVMDLLFDGILIDCSSHEFSAKALCSGLES-  -EGAVMPFNE-  - 

MITVPYLLVIGVLLAIQRDKEAMLPLIS - KGLDIIFEPLESAFVTVRVMDLLFDGIPVDCSSEEFAAKALCSGLDS-  -EGAVAPLND-  - 

VIVMPNIFM§AMALTVYREKPAMLNVAA - KAINGIFDSPSDVFMRVKALDILFRGIIINCDRTEFAPKAACTTIKKEAPNGIVFEPN-  - 

TIIMPHVFMMSMAITVYRDKPSMMNMLG - KAINGIFDNPSDVFMRVNAMDILFRGVIINCDRTEFAPKAACTAIKKEGAKSLIIEPN-  - 


THFLFSFMG0AN-HSDSGRFTVCRGVKNNKKLGKWKFADEPEQDIWPD - GECNTFVGTDSTVFAPGLKK- -EDGLWAFTPDLCRSLGAYYQHKSSYHGMPSMRYTLD - LGDIRADEKLHCFCEDPED 

THFLFSFMGQAN-HSDAGRFTVCRGVKNNKKLGKVIRFAEETEMDVWPG - DECNQFEGTDSTVFPPGLKK- -EEGLWAFTPDLCRSLGATYVRKSSYHGMPSTRYTLD - LGDMRSEEKLHCFCDDPED 

THYQFSIFGGRN-ATDAGRWWYRGVKNIKDLGRIVSFNGETEMDTYDG - DECNQFVGTDSTIFPPFLTK- -EDRLWAWSPEICRSMGATYGGKSKYAGMPMSYFELD - FGDL0NEPENHCFCRDPP- 

THVKFSMFGy RN-ATDAGRWWYRGVKNIRDLGRIVSYNGEPEMDIYDG - DECNQYIGTDSTIFPPFLTK- -QDRLWAWAPEICRSLGAHYIGKSKYAGMPMSLFKLD - FGDLKNEPENHCFCRDPP- 

THYKFSFFGV RN-HTEAGRWWYRGVKNIRDLGRWSYNEETEMDIWDG - DECNQYIGTDSTIFPPFLTA- -QDRLWAWSPEICRSLGAHYVHKSKYAGLPMSYFELD - FGDLKNEPHNHCFCRDAP- 

THFKFSMFGLKN-GTDAGRWWYRGVKNIMDLGRWSFNDETEMDIYDG - DECNRYIGTDSTIFPPFLTT- -KDKLWAWSPEICRSIGAEYGGKSKYAGLPMSFFKLD - FGDARNEPEHHCFCRDPP- 

THVKFSMFGLRN-GTSIGRFKVYRGIKNVADLGRVITYNDETEMDFYDG - DECNKYVGTDSTIFPPFLTK- -KDRLWAWSPEICQSLGAVYAGKSSYQGFPTSFFTID - FGDLRDDPVHQCYCRDPP- 

NQLRFSLFG|v|RNNSVDPHWTVKRGVQNVMDVGRWAIDGKTKMNVWR - DSCNEYQGTDGTVFPPFLTH--KDRLQSFSGDLC§SFKPWFQKKTSYNGIKTNRYVAN - IGDFANDPELQCYCDSPD- 

NQLRF  S  L  FGLKNHTVD  SRWTVKRG I KNVMDVGQWAMDGAPQLE I WN - DHCNEYQGTDGTIFPPFLTQ- -KDRLQSYSADLCRSFKPWFQKTTYYRGIKTNHYIAN - MGDFANDPELNCFCETPE- 


LDTCPPP - KGTMNLAACVGGPLMASMPHFYLGDPKLVADVDG - LNPNEKDHAVYIDFE 

LDTCPP - RGTMNLAPCVGGPLLASMPHFYNGDPKLVAAVDG-LHPNEKDHAVYIDFE 

-EDCPP - KGTMDLAPCLGAPLLGSKPHFIDADPKLLEEVQG-LEPNREDHDMFINFE 


-EDCPP - KGTMDLSMCIGVPILGSKPHLLDADPKLLEGVDG-LEPNEAEHDVFIHFELLSG- 


2  0  2 

- -TPFQAAKRLQFNLDMEPVEGIEPMKNLPKLILPMFWVEEGVQLNKTYTNLVKYTLF0GLKINSVLRWS 
- -TPFQAAKRLQFNLDMEPVEGIEALKNLPKLILPLFWIEEGVHLNKTYTNMVKYTLFLGLKFNSGLRWT 

-  - TPVSAAKRLQFNLEMEPVRDHEVLGNLPNVILPVFWVQEGVSLNKTWTNQLKYQLF L GLKFNATVKWL 

-  - TPVSGAKKLQLNLEVEPIRDHEVLGNLPTWLPMIWVEEGVSLNKTWTNQLKYQLF L GLKFNATVKWL 


-DDCPP - KGTMDLSPCLGGPIIGSKPHFYGADPKLVEAVDG-LAPNKAAHDVYIHFELASICWV|S|PVSAAKRLQFSMELGPIRDHELFGQLPDVILPMFWAEEGASLNKTWTNQLKYQLFL  GLKFNATVKWL 


-DICPP - KGTIDLAPCLGAPI IGSKPHFYDSDPKLLAAVDG - LTPNEKDHDVYIHFQ [ 

-DGCPP - KGTIDLGPCVGAPILGSKPHFIGGDPKLLRDVDG-LEPDPKEHDIFIHYD 

--KCPP - KGLMDLYKCIKAPMFVSMPHYLEGDPELLKNVKG-LNPNAKEHGIEIDFE 

--KCPP 


jLSG - TPVSAAKRLMFSMEIEPIRDHAVLGNLPTVILPLFWAEEGASLNKTWTNQLKYTLFLGLRFNTAVKWL 

iDTG - TPFSAAKRLQFNLELEPIRGHEVFGKLPKMVLPMFWAEEGASLNKTWTKQLK-PLF^IRKFNATVKWL 

p|lSG---TPMVAKQRIQFNIQLLKSEKMDLLKDLPGTIVPLFWIE§GLSLNKTFVKMLKSQLFIPKRWSWCWC 
-KGLMDLTKCVKAPMYASMPHFLDADPQMLENVKG-LNPDMNEHGIQIDFEPISG - TPMMAKQRVQFNMELLRVEKIEIMKELPGYIVPLLWIEGGLALNKTFVKMLKNQLFIPKRIVSVIRWW 

k  (k)  I  m 

1 

LITFSLVGLMFSAYLFYHKSDSLDINS - ILKDNNKVDDVASTKEPLPSANPKQ - SSTVHPVQLPNT - L I PGTNPATNPATHHKMEHRERY 

LITLSLVGLMSAAYLFYQNSDSLDITLPPKILKEVNKVADAAMNSKMFPEKAPTT - PQTT I PGTNP PTNHGAQP P PAVAS VPG I I P PL S LKMEQAQRY 

TIIIGTVGSIGAGIMHYKRSTKSVNVTPVEA-  -VSNGSGRIISVSSAGKDREAIN - NSKNLPAVLD - GLG-ERIPKMAP VEQRY 

TIIVGTLGSIGAGFMHYKRTSQVTQVEQVAAGAKSEGGGRFISVSAATTEEGKNG - GGGNLPAVLD - GLDPSGISKRMSDPAQKERY 

TI I IGTVGAVGSAYMYFRKETKTTDVAPVDVSTPDTNP - SSAKDGWNVS - LGRNLPPVID - GLD- -KPPKLRATELQQERY 

TIIIGTIGTIVGGFMHYKRTTKMVNVTPVQSVNGSSAKGKGAGMTWGHQPDSKGGSVTAPVIPSAKDLLQNSRNLPTVIE - GLD - KPQKVTVTEMQERY 

SIVLGTLGTIGAGFMHYKLHIKPVNVRPMEVQKTTVKEVEPSVETNGTGKEPPEK - 1 E  PR  WE  S  AHRNL  P  P  L  FDG - GLA GKQKPVPSDQRER- 

MISFGSLGVIAAVIFHFK§DIMHLAVA - GDNSVSKIKPENDENKEVGVMG - QNQEPAKVM - 

LLSFGMLAALGGVIFHFKDDIMRIAIK - GDSSVTKVNPEDGEQKDVSVIG - QSHEPPKINM - 


Fig.  3.  Alignment  (ClustalX)  of  Diptera  and  Lepidoptera  SNMP  amino  acid  sequences;  the  sequences  included  are  those  for  which  a  valid  cDNA  model  is  available.  While  there  is  no 
available  genomic  sequence  for  M.  sexta,  this  species  is  included  for  comparison  with  B.  mori.  3 A  shows  the  SNMP1  proteins  and  3B  the  SNMP2  proteins;  these  were  aligned  as  one 
group  and  separated  for  this  presentation  (six  gaps  in  the  SNMP1  alignment  are  against  the  SNMP2  alignment;  one  gap  in  the  SNMP2  alignment  is  against  the  SNMP1  alignment). 
Insertion  sites  are  noted  by  boxes  (3'  amino  acid  of  each  exon);  phases  of  boundary  codons  are  noted  (0, 1,  2).  Homologous  intron  insertion  sites  are  also  noted  (a-n)  for  cross 
reference  with  Table  1. 


designed  to  annotated  sequences  previously  identified  by  Nichols  and 
Vogt  (2008).  Genomic  sequences  for  SNMPlAaeg  and  SNMP2Aaeg 
were  obtained  from  BAC  clones  generously  supplied  by  Dave  Sev¬ 
erson  and  Becky  deBruyn  (Notre  Dame  University).  Expression  of 
D.  melanogaster  and  A.  aegypti  SNMP1  and  SNMP2  was  confirmed  by 
PCR  (non-quantitative)  of  cDNAs  synthesized  from  mRNAs  isolated 
from  a  variety  of  tissues  including  heads/antennae,  legs  and  wings 
(Fig.  2).  The  identities  of  these  PCR  products  were  confirmed  by 
sequencing.  The  near-full  length  cDNA  and  genomic  (BAC)  SNMP 
sequences  were  submitted  to  GenBank,  as  were  partial  SNMPIDmel 
cDNA  sequences  derived  from  head,  leg  and  wing  tissues  (see 
Supplementary  Materials:  Table  2). 

The  D.  melanogaster  and  A.  aegypti  SNMP  cDNA  sequences,  and 
an  available  SNMPIDmel  cDNA  were  used  as  models  to  identify  and 
annotate  orthologues  from  the  genomes  of  D.  pseudoobscura , 

A.  gambiae  and  C.  pipiens  q.  Genes  encoding  SNMP1  and  SNMP2  of 

B.  mori  ( snmplBmor  and  snmp2Bmor)  were  identified  and  anno¬ 
tated  from  the  genome  of  B.  mori  using  a  published  cDNA  of 
SNMPIBmor  and  published  cDNAs  of  SNMP2s  identified  from  other 
Lepidoptera.  Please  see  Supplementary  Materials  (Methods)  for 
details  of  these  activities,  and  Supplementary  Materials  (Tables  2 
and  3)  for  all  accession  numbers  and  annotations,  and  for  predicted 
cDNA  and  amino  acid  sequences  of  all  taxa  used  in  this  study. 


Fig.  3  shows  an  alignment  of  SNMP  proteins  from  the  Diptera 
and  Lepidoptera  species.  This  figure  notes  intron  insertion  sites  and 
their  phase  (a  codon  not  split  by  the  intron  has  phase  0,  a  split 
codon  has  phase  1  or  2  depending  on  whether  the  split  is  between 
nucleotides  1-2  or  nucleotides  2-3).  Intron  insertion  sites  are  noted 
by  letters.  All  homologous  intron  insertion  sites  have  the  same 
phase,  both  within  a  specific  SNMP  and  between  SNMP1  and 
SNMP2,  supporting  the  evolutionary  relatedness  between  these 
genes.  Many  of  these  sites,  including  their  phase,  are  conserved 
between  Diptera  and  Lepidoptera. 

Fig.  4  shows  a  Neighbor  Joining  tree  of  all  SNMP  sequences  listed 
in  Supplementary  Materials  (Tables  2  and  3).  The  topology  of  this 
tree  is  rooted  against  the  larger  CD36  gene  family  (Fig.  1A)  at 
a  position  noted  by  an  asterisk.  This  analysis  suggests  two  SNMP 
sub-clades:  SNMP1  and  SNMP2.  Among  the  Lepidoptera  and 
Diptera,  only  the  dipteran  mosquito  C.  pipiens  q.  shows  an  expan¬ 
sion  of  the  SNMP1  genes.  Three  SNMP1  Cpip  genes  were  identified 
in  the  genome;  these  were  arrayed  in  an  uninterrupted  series  in 
a  region  spanning  31,381  bp;  the  three  genes  are  separated  by  4704 
and  7770  bp  respectively,  have  identical  intron  insertion  site 
topology  (Table  1),  and  share  67-81%  amino  acid  identity  (Table  2), 
all  strongly  suggesting  they  resulted  from  gene  duplication  events 
distinct  from  the  A.  aegypti  and  A.  gambiae  lineages. 
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B 

SNMP2Dmel 

SNMP2Dpse 

SNMP2Aaeg 

SNMP2Cpip 

SNMP2Agam 

SNMP2Bmor 

SNMP2Msex 


SNMP2Dmel 

SNMP2Dpse 

SNMP2Aaeg 

SNMP2Cpip 

SNMP2Agam 

SNMP2Bmor 

SNMP2Msex 


SNMP2Dmel 

SNMP2Dpse 

SNMP2Aaeg 

SNMP2Cpip 

SNMP2Agam 

SNMP2Bmor 

SNMP2Msex 


SNMP2Dmel 

SNMP2Dpse 

SNMP2Aaeg 

SNMP2Cpip 

SNMP2Agam 

SNMP2Bmor 

SNMP2Msex 


SNMP2Dmel 

SNMP2Dpse 

SNMP2Aaeg 

SNMP2Cpip 

SNMP2Agam 

SNMP2Bmor 

SNMP2Msex 


0  2 

MIHWSLIVSALGVCVAVLGG - YCGWILFPN- -MVHKKVE0SWIQDGSEQFKRFVNLPQPLNFKVYIFNVTNSDRIQQGAIPIVEEIGPYVY^QFRQKKVKHFSRDGSKISYVQNVHFDFDAAASAP- YTQDD 

MLHWSLIVSALGVCVAVLGG - YCGWSLFPN- -MVHKKVEQSVILADGSEQYKRFVNLPQPLNFKVYIFNVTNPDMIQHGAIPIVEEIGPYVYKQYRHKKVKHFSRDGSKITYVQNVHFDFDADASAP-YTQDD 

- MMVVNTELRQDTPQFKRWEAVPQPLDFKVYIFNVTNPYEVQMGRRPRWEVGPYVYFQYRHKDNIRFSRDRSKVHFSQQQMYVFDAESSYP-LTEND 

MKARNLNPPAPSQGGKLIKGKVLCSSPWVKAKAGCGAARLAKDNTELRQGTPQFKRFEALPQPLDFKVFIFNVTNPYEVQMGKRPRWEVGPYIYFQYRQKDNIRFSRDRSKVHFSQQQLYVFDAESSYP-LTEND 

- ftATELRQGTDQYKRWEALPQPLDFKVYIFNVTNPYEVMQGRRPKWEVGPYVYFQYRQKDNVRFSRDRSKVHFSQQQMYVFDAESSYP-LTEND 

- MLAKYTKTIFSVSVAFLWSIVLATWGFPKIIRKQIQKNVQISNTSKMYDKWVKLPMPLDFKIYVFNVTNRDAINQGEKPNLKEIGPYVYKQYREKIILGYG-DNDTIKYNLKKTFVFDPVASGD-LREDD 

- MLAKHSKLFFTGSWFLIVAIVLASWGFPKIISTRIQKSIQLENSSMMYDKWVKLPIPLIFKVYFFNVTNAEGINEGERPILQEIGPYVYKQYRERTVLGYG-PNDTIKYMLKKNFVFDPEASNG-LTEDD 

a  b 


0  0  1 

rivalnmhm^flqvfereitdifqgfanrlnsrlnqtpgvrvlkrlmerirgkrKsvlqisendpglalllvhlnanlkavfndprsmsvstsvreylfdgvrf-cinpqgiakaicnqikesgsktireksd-- 

RIVALNMHMSIAFLQVFEREITDIFQGFANRLNSRLNRTPGVRILKRLMERIRGKRKSVLQISENDPGLALLLVHLNANLKAVFNDPRSMFVHTSVREYLFDGVRF-CINPQGIAKAICNQIKESGSKTIREQSD-- 
QLTVLNMHMflSILQIIDTQAKETITNFRSDVNNTLEKIPVVRVIKRIIEKTTP- I  2SILQLAEDETYDSLRLI  -  -NAELNRIFGRPDSMFLRTTPREFLFEGVPF-CVNVIGIAKAICKEIEKRNTKTIRVQPD-  - 
PLMVLNMHMISISILQIIDNQAKETITNFRSDVNNTLEKIPIVRVIKRIIEKTTP- 1  2SILQIAEDETYDSLRLI  -  -NVELNRIFGRPDSMFLRTTPKEFLFDGVPF-CVNVIGIAKAICKEIEKRNTKTIRVLPD-  - 
ELTVLNMHMSISILQIIDNQAKETITNFRSDVNNTLEKIPWRVIKRIIERTTP-I3SILQIAEDETYDSLRLI--NVELNRIFGRPDTMFLRTTPKQFLFDGVPF-CVNVIGIAKAICKEIEKRNTKTIRTMPD-- 

ELTVINFSY|^IISVQEMMPAAVGMIN - RALEQFFTNLTDPFQTVKVKDLFFDGLFLNCEGDNTAL^LICGKIRAEKPPTMRISKS-A 

DVTVINFPYMAALLTIQQMMPSAVAMVN - RALEQFFSNLTDPFMRVKVKDLLFDGVFLNCDGDSPALSLVCAKLKADSPPTMRPAEDGV 

c  d  e 


0  2  2 

GSLAFSFFG0KN-GSGHEVYEVHTGKGDPMRVLEIQKLDDSHNLQVWLNASSEGETSVCNQINGTDASAYPPFRQR- -GDSMYIFSADICRSVQLFYQTDIQYQGIPGYRYSIGENFINDIGPEHDNECFCVDKLA 
GSLAFSFFGHKN-GSGHDVYEVHTGKGDPMKVLEIQKLDDSHNLQVWLNASTEGETSVCNQINGTDASSYPPFRQR- -GDSMYIFSADICRSVQLFYQADIQYEGIPGYRYSIGENFINDIGPEHDNECFCVDKLA 
GSMKFSFFNHKN-MTNDGTYTINTGIKEPALTQMIEYWNGRNTLDRWINQSA-GSSSKCNKIVGTDGSGYPPFREG- - VERMTIFSSDIC0TVDIKYVGPSSYEGIPALRFETDSHFLNEIGPEYGNDCYCVNRIP 
GSMKFSFFNHKN-MTEDGIYTINTGVKNALETQMIEFWNGKNMLDKWSNSSR-GSSMTCNKIEGTDGSGYPPFREG- - VQRMTIFSSDICRTVDIKYVGSSSYEGIPAARYVTDDNFLNKIGPEHNNDCYCVNRIP 
GSLRFSFFSHKN-MTDDGMFTINTGIKDPSRTQMIELWNGRTTLDVWNNRSS-GLSSSCNKIHGTDGSGYPPFRTG--VERMTIFSTDICyTVDIKLTGSSSYEGIPALRYEIDNNFLHEIGPEYGNDCYCVNKIP 

NGFYFSMFS[h|mN-RTVSGPYEMVRGTENLSDLGHVISYQGKRIMSAWD - DQYCGQLNGTDSTIFPPLEDGNIPEKLYTFEPDICRSLFASLVGKDTLFNISTYYYEISDMTLGSKSANPDNKCFCg R 

NGYYFSMFSHLN-RTETGPYEMVRGTEDVFALGNIVSYKEKKSVSAWG - DEYCNRINGSDASIFPPIDENNVPERLYTFEPEICRSLYASLAGKATLFNISTYYYEISSSALASKSANPDNKCYCK K 

f  g  i 


i  o 

nvikrkngclyagaldlttcOdapviltlphmlgasneyrkmirg-lkpdakkhqtfvdvqSltg-- 

NVI KRKNGCLYAGALDLTTC L  DAPVI LTLPHMLGASNEYRKMIRG - LNPDAKKHQTFVDVQ S  LTG - - 
KAI VKNNGCLYKGALDLSTC  F  DAP WLTHPHMMGAAQEYTSL I DG - LYPDPEKHQ I FVDVE P  LTG - - 
KAI VKANGCLYEGALDLSTC  F  DAP WLTLPHMMGAAEEYTSL I DG - MHPDPEKHQ I FVDVE  P  LTG - - 
KS I VKSNGCLYKGALDLSNC  F  DAPWLTLPHMLGVAEEYTAL IDG- MDPEPERHQ I FVDVE  P  YTG - - 
NGSVKHDGCLLMGVLNLAPCQGAPAIASLPHFYLGSDELADFFGDGIKPDKEKHNTYVHLDglTG- - 
DWSASHDGCLLMGVFNLMPCQGAPAIASLPHFYLASEELLEYFEDGVKPDKEKHNTYVYIDPVTG- - 

j  k 


-TPLQGGKRVQFNMFLKSINRIGITENLPTVLMPAIWVEEplQLNGEMVAFFKKKLISTLKTLNIVHWA 

-tplrggkrvqfnmflksinrigitenlttvlmpaiwve||giqlngemvaffkkklintlkalnivhwa 

-TPLNGGKRVQFNMFLRRIDSIRLTDRLQTTLFPVLWIEEGIALNEDMVKLIDDSLMKVLTLLDIVQWV 

-TPLNGGKRVQFNMFLRRIDSIRLTDRLPTTLFPVLWIEEGIALNEDMVKLIDDSLMKILTILDIVQWT 

-TPLNGGKRVQFNMFLRRIDAIKLTDRLQPTLFPVIWIDEGIALNEDMVKLIDDSLMKVLSLLDWQWV 

-WIKGVKRLQFNIELRNVPSVPQLKEVPSGLFPLLWIEgGAEIPEWLRKEIMDSHT-MLWYVDAARWL 

-WLKGVKRLQFNIELRNMPRVPQLQAVPTGLFPMLWIEEGAVMTPDLQQELRDAHA-LLSYAQLARWI 

I 


TLCGGIGVAVACLIYYIYQRGRWE - PPVK - 

ALCGGAGVALISLLYYLYQKGRGEE - APLK - 

MIGSGLLLAIIMPIVYFIKRRPSSG- -SITPTLTTTTSTVSISDGGGLGGNPQK- 
MIAIGLFLAISMPILYFTKRRPSSG- -TITPTLTTTTSAASIPERGGLGGNPDK- 
LIGVGLLLAVLMPTVYFVKRCRGEGSRTVSPAVTATTSAASLSTVAGVTGDRSK- 

VL  AVAWAVL  V  SAT  L  VARS  AAL I PWPRNSNS I S  F I LGNSVNTSKVHS - 

ILAAAIILAIIATITVARSTSLISWPRNSNSVNFIIGPMVN-DKMR - 


Fig.  3.  ( continued ). 


The  Neighbor  Joining  tree  (Fig.  4)  also  includes  sequences  from 
Hymenoptera  (A  mellifera )  and  Coleoptera  (T.  castaneum).  SNMP1  and 
SNMP2  genes  are  represented  in  both  taxa.  I  castaneum  shows  an 
apparent  Coleoptera  specific  expansion  of  the  SNMP1  genes.  Four 
SNMP1  Teas  genes  are  arrayed  in  an  uninterrupted  series  in  a  region 
spanning  12,727  bp;  annotated  gene  sizes  range  from  1885-31,381  bp, 
intervals  between  genes  range  from  260  to  1820  bp,  each  annotation 
indicates  7  or  8  exons.  The  proximity  of  these  4  SNMP1  Teas  genes 
strongly  suggests  they  derived  from  a  gene  duplication  event.  We  have 
not  further  analyzed  the  A.  mellifera  and  T.  castaneum  SNMP  genes  due 
to  an  absence  of  appropriate  cDNA  models. 

Table  1  compares  the  presence  of  homologous  intron  insertion 
sites  and  the  sizes  of  homologous  introns  and  exons  of  those 
sequences  shown  in  Fig.  3.  Some  introns,  especially  those  in 
A.  aegypti,  are  quite  large,  around  10-15  kb.  The  large  size  of  introns 
is  somewhat  species  specific,  correlating  with  the  concentration  of 
repetitive  elements  within  those  species’  genomes  (e.g.  The  Inter¬ 
national  Silkworm  Genome  Consortium,  2008;  Nene  et  al.,  2007).  In 
most  cases,  large  introns  did  not  obscure  the  identification  of 
coding  exons.  However,  initial  exons  were  often  difficult  to  predict 
if  their  model  genes  contained  a  short  first  exon  followed  by  a  large 
first  intron. 

Table  2  compares  the  amino  acid  sequence  identities  of  the 
dipteran  and  lepidopteran  SNMPs  included  in  Fig.  3.  Sequence 
identities  are  quite  high  when  comparing  the  same  gene  within  an 
insect  Order,  but  predictably  decrease  with  phylogenetic  distance. 
SNMP1  vs.  SNMP2  sequence  identities  are  quite  low,  consistent 
with  an  early  divergence  for  these  two  gene  sub-clades. 

Table  3  compares  the  synonymous  (dS)  and  non-synonymous 
(dN)  changes  that  have  occurred  in  the  coding  nucleotide 
sequences  of  the  dipteran  SNMP1  and  SNMP2  genes.  In  general,  dN/ 
dS  values  are  low:  0.08-0.30  for  SNMP1,  0.03-0.30  for  SNMP2. 


These  values  suggest  that  negative  or  purifying  selection  and  not 
positive  selection  acting  on  these  genes  (Nei  and  Gojobori,  1986), 
and  therefore  that  each  of  these  SNMP  genes  is  expressed  as 
a  functional  protein  (Torrents  et  al.,  2003). 

4.  Discussion 

This  study  focuses  on  the  SNMP  genes  of  holometabolous 
insects,  a  lineage  which  emerged  within  the  Neoptera  lineage 
around  300  Mya  and  comprises  >80%  of  named  insect  species  and 
the  most  successful  (by  number  of  species)  insect  Orders  (Fig.  IB, 
see  Nichols  and  Vogt,  2008).  SNMPs  are  related  to  a  larger  gene 
family  characterized  by  the  human  fatty  acid  transporter  CD36, 
a  membrane  protein  with  a  broad  range  of  described  roles  that 
include  cholesterol  transport  by  macrophage  cells,  cell-cell  recog¬ 
nition  or  cytoadhesion  between  a  variety  of  cells,  and  fatty  acid 
recognition  in  taste  receptor  cells  (e.g.  Rasmussen  et  al.,  1998; 
Gilbertson  et  al.,  2005;  Calder  and  Deckelbaum,  2006;  Febbraio  and 
Silverstein,  2007;  Fukuwatari  et  al.,  1997;  Rac  et  al.,  2007;  Gaillard 
et  al.,  2008a, b).  In  our  previous  study  (Nichols  and  Vogt,  2008),  we 
characterized  and  reviewed  the  CD36  gene  family  in  insects  using 
available  genome  sequences,  all  of  which  were  within  the  holo¬ 
metabolous  lineage.  We  suggested  that  the  insect  CD36  gene  family 
is  comprised  of  3  major  clades,  one  of  which  includes  the  SNMP 
genes.  Several  of  the  D.  melanogaster  CD36  family  members  outside 
the  SNMP  clade  have  been  characterized  and  shown  to  possess 
similar  functions  as  CD36:  NinaD  and  Santa  Maria  function  as  fatty- 
acid  transporters  (carotene)  (e.g.  Giovannucci  and  Stephenson, 
1999;  Kiefer  et  al.,  2002;  Yang  and  O’Tousa,  2007;  Wang  et  al., 
2007);  Croquemort  and  Peste  function  through  cell-cell  interac¬ 
tions  to  mediate  attacks  on  apoptotic  cells  or  bacteria  (e.g.  Franc 
et  al.,  1996;  Stuart  et  al.,  2005;  Philips  et  al.,  2005).  Within  the 
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Fig.  4.  Neighbor  joining  tree  of  SNMP  sequences  noted  in  Supplementary  Materials  (Table  2)  (MEGA4,  complete  gap  deletion);  bootstrap  support  is  indicated  by  symbol  with 
branches  collapsed  at  50%  (unmarked  nodes  have  50-79%  bootstrap  support).  This  tree  is  unrooted;  however,  the  asterisk  notes  the  position  the  Clade  3  node  in  Fig.  1A. 


SNMP  clade,  SNMP1  has  been  shown  to  be  required  for  the 
chemosensory  detection  of  the  fatty  acid  pheromone  CVA  (Benton 
et  al.,  2007;  Jin  et  al.,  2008),  perhaps  similar  to  a  reported  associ¬ 
ation  of  CD36  with  mammalian  taste  cells  and  its  possible  function 
in  fat  detection  (Fukuwatari  et  al.,  1997;  Gilbertson  et  al.,  2005; 
Gaillard  et  al.,  2008a, b).  Our  study  demonstrated  structural  simi¬ 
larities  of  the  insect  CD36-related  genes  that  argued  for  a  common 
origin,  and  further  suggested  that  homologues  of  these  genes  are 
represented  in  species  throughout  the  holometabolous  lineage 
(Nichols  and  Vogt,  2008). 

Our  previous  study  (Nichols  and  Vogt,  2008)  suggested  there 
may  be  multiple  subgroups  within  the  SNMP  clade.  For  the  current 
study,  we  cloned  and  sequenced  SNMP  genes  from  D.  melanogaster 
and  A.  aegypti  and  used  these  to  remodel  available  SNMPs  and  to 
identify  SNMPs  from  additional  species;  we  also  used  available 
Lepidoptera  SNMP  sequences  to  identify  SNMP  genes  from  B.  mori. 
Analysis  of  these  remodeled  and  additional  sequences  suggests  the 
insect  SNMPs  are  organized  into  two  sub-clades,  SNMP1  and 
SNMP2,  presumably  deriving  from  a  common  ancestor. 

Both  SNMP1  and  SNMP2  are  expressed  in  a  variety  of  tissues.  In 
their  study  of  SNMP1  Dmel  function  in  antenna  olfactory  neurons, 
Benton  et  al.  (2007)  noted  expression  in  both  non-antennal  and 
antennal  tissue;  within  the  antennae  it  was  reported  in  neurons 
and  non-neurons  (presumably  sensilla  support  cells).  For  the 


current  study,  we  cloned  and  sequenced  SNMP1  Dmel  mRNAs  from 
head  (including  antennae),  leg  and  wing  tissues  (see  Supple¬ 
mental  Materials  Table  2).  PCR  analysis  showed  SNMP1  Dmel, 
SNMP2 Dmel,  SNMPlAaeg  and  SNMP2Aaeg  expression  in  heads/ 
antennae,  legs  and  wings  (Fig.  2).  We  identified  SNMP2Bmor 
cDNAs  in  larval  EST  libraries  derived  from  maxillary  galea  (che¬ 
mosensory  antenna),  silk  glands  and  midgut  (see  Supplemental 
Materials  Table  2).  Two  recent  studies  have  shown  that 
SNMP1  Dmel  is  required  for  the  detection  of  the  pheromone  CVA 
and  proposed  specific  molecular  models  underlying  this  require¬ 
ment  through  direct  interaction  with  the  CVA  receptor  protein, 
either  mediating  the  transfer  of  CVA  from  odorant  binding  protein 
to  receptor  (Benton  et  al.,  2007),  or  acting  as  an  inhibitory  subunit 
of  the  receptor  (Jin  et  al.,  2008).  Flowever,  the  broad  expression 
pattern  of  the  SNMPs  suggests  that  the  function  of  these  genes 
may  be  more  general  than  those  proposed,  or  that  the  SNMPs  have 
diverse  functions  specific  to  the  different  tissues.  Mammalian 
CD36,  the  defining  member  of  the  overall  gene  family,  also 
expresses  in  a  wide  range  of  tissues  and  displays  a  range  of 
phenotypes  that  might  broadly  be  described  as  fatty  acid  trans¬ 
port  and  cell-cell  recognition,  similar  to  the  D.  melanogaster  CD36 
homologues  NinaD,  Santa  Maria,  Croquemort  and  Peste  (see 
Nichols  and  Vogt,  2008).  The  question  remains  whether  SNMPs 
show  similar  functions. 
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Table  1 

Comparison  of  Intron/Exon  Sizes  for  homologous  introns. 


Intron  (In)  positions  (a-n)  are  noted  in  Fig.  3;  intron  sizes  are  noted  in  yellow  cells  and  exon  sizes  in  white  cells.  An  entry  of  “1”  indicates  a  non-annotated  5’  exon,  but  an 
identifiable  homologous  intron  boundary  at  the  5’  end  of  the  next  exon.  “?”  under  Bm-S2  indicates  a  region  of  missing  sequence  data  (see  Supplementary  Materials:  Methods 
and  Sequence  Data).  Dm,  D.  melanogaster,  Dp,  D.  pseudoobscura ;  Aa,  Ae.  aegypti;  Ag,  An.  gambiaea;  Cp,  C.  pipiens  qu.;  Bm,  B.  mori;  SI,  SNMP1 ;  S2,  SNMP2.  (For  interpretation  of 
the  references  to  colour  in  this  table  legend,  the  reader  is  referred  to  the  web  version  of  this  article.). 


A  challenge  of  mining  a  genome  database  is  confirming  the  genes 
are  expressed  as  functional  proteins.  Here,  we  used  analysis  of  non- 
synonymous  and  synonymous  nucleotide  changes  (dN/dS)  to  suggest 
that  the  dipteran  SNMPs  analyzed  in  this  study  are  expressed  and 
functional.  dN/dS  analysis  is  a  test  for  evolutionary  selection  acting  on 
homologous  DNA  sequences.  Nucleotides  of  individual  homologous 
codons  are  compared,  noting  nucleotide  changes  that  change  the 
amino  acid  (non-synonymous  change)  and  nucleotide  changes  that 
do  not  change  the  amino  acid  (synonymous  change).  Assuming  that 
all  nucleotides  have  an  equal  probability  of  changing  over  evolu¬ 
tionary  time,  observing  a  disproportionate  number  of  changes  in 
synonymous  or  non-synonymous  changes  suggests  positive  (dN/ 
dS  >  1 )  or  negative  selection  (dN/dS  <  1 )  (e.g.  Hughes  and  Nei,  1988). 
For  example,  analysis  for  positive  selection  has  been  used  to  attempt 
identification  of  ligand  binding  sites  in  chemosensory  receptors  (e.g. 
Tunstall  et  al.,  2007).  More  importantly,  selection  can  arguably  only 
act  on  genes  that  are  in  fact  expressed  and  functional,  and  therefore 


dN/dS  analysis  should  be  useful  to  indirectly  suggest  that  a  gene  is 
expressed  and  functional.  Torrents  et  al.  (2003)  compared  1659 
functional  and  1703  pseudo-  (presumed  non-functional)  genes  from 
the  human  genome  database:  dN/dS  values  were  broadly  distributed 
for  both  categories,  but  about  90%  of  expressed  genes  had  values  at  or 
below  0.2,  while  about  80%  of  pseudogenes  had  values  at  or  above 
0.2.  Our  analysis  (Table  3)  yielded  values  ranging  from  0.03  to  0.30, 
with  most  below  0.20;  these  values  suggest  that  most  if  not  all  of  the 
SNMPs  are  expressed  as  functional  proteins. 

Using  dN/dS  analysis  to  indirectly  suggest  functional  expression 
should  be  useful  for  studies  involving  large  gene  families  such  as 
chemosensory  genes  (including  odor  receptors,  odorant  binding 
proteins,  gustatory  receptors),  and  especially  where  genome 
sequences  are  available  from  closely  related  species  permitting 
comparison  of  orothologous  sequences.  Direct  methods  such  as 
PCR  (e.g.  Robertson  and  Wanner,  2006)  or  microarray  surveys  (e.g. 
Zhang  et  al.,  2004)  require  considerable  effort  and  only  confirm 


R.G.  Vogt  et  al.  /  Insect  Biochemistry  and  Molecular  Biology  39  (2009)  448-456 


455 


Table  2 

Percent  identities. 


%ID 

SNMP1 

SNMP2 

Dm 

Dp 

CP  lb 

CP  lc 

Ag 

Aa 

CP  la 

Ms 

Bm 

Dm 

Dp 

As 

Cp  Ag  Ms 

SNMP1  Dpse 

83 

SNMPlb  Cpip 

51 

53 

SNMPlc  Cpip 

50 

49 

81 

SNMPlAgam 

51 

52 

74 

75 

SNMPlAaeg 

51 

51 

73 

75 

75 

SNMPla  Cpip 

49 

48 

67 

70 

67 

68 

SNMP1  Msex 

42 

40 

43 

42 

40 

42 

41 

SNMP1  Bmor 

40 

41 

43 

41 

39 

41 

43 

74 

SNMP2  Dmel 

27 

26 

27 

27 

27 

27 

27 

29 

29 

SNMP2  Dpse 

27 

26 

27 

27 

27 

28 

27 

30 

29 

93 

SNMP2Aaeg 

26 

27 

29 

29 

29 

27 

31 

31 

29 

48 

49 

SNMP2  Cpip 

26 

28 

29 

29 

29 

27 

30 

30 

30 

49 

48 

88 

SNMP2Agam 

27 

28 

29 

28 

28 

27 

30 

31 

31 

48 

48 

84 

81 

SNMP2  Msex 

24 

26 

29 

29 

30 

29 

29 

28 

28 

31 

31 

30 

31  30 

SNMP2  Bmor 

29 

30 

30 

29 

31 

30 

30 

30 

29 

29 

29 

28 

29  30  68 

Percent  Identity:  p-distance  (fractional  absolute  identity  differences)  values  calculated  using  Mega4,  following  gap  deletion  of  alignment  shown  in  Fig.  3.  Values  were 

converted  to  percent  identity.  For  species  abreviations,  see  Table  1  legend. 


Table  3 

dN/dS  and  Percent  Identities. 


dN/dS 

SNMPlDmei 

SNMP1  Dpse 

SNMP1  Dpse 
SNMPlAaeg 
SNMPlAgam 
SNMPla  Cpip 
SNMPlb  Cpip 
SNMPlc  Cpip 

0.08(0.12/1.57) 

#(0.44/#) 

#(0.46/#) 

#(0.48/#) 

#(0.46/#) 

#(0.46/#) 

X 

#(0.42/#) 

#(0.44/#) 

0.26(0.47/1.84) 

0.18(0.40/2.22) 

0.17(0.44/2.64) 

X 

0.09(0.18/1.91) 

0.15(0.23/1.50) 

0.12(0.18/1.49) 

0.10(0.17/1.75) 

X 

0.18(0.26/1.42)  X 

0.12(0.19/1.62)  0.30(0.23/0.77) 

0.13(0.20/1.54)  0.27(0.21/0.79) 

X 

0.15(0.12/0.81) 

dN/dS 

SNMP2Dmei 

SNMP2  Dpse 

SNMP2Aaeg 

SNMP2Agam 

SNMP2  Dpse 
SNMP2Aaeg 
SNMP2Agam 
SNMP2  Cpip 

0.03(0.05/1.85) 

0.20(0.51/2.58) 

#(0.51/#) 

#(0.49/# 

X 

0.25(0.50/2.04) 

0.30(0.51/1.75) 

0.25(0.50/1.98) 

X 

0.08(0.12/1.57) 

0.04(0.07/1.74) 

X 

0.09(0.13/1.46) 

dN/dS:  ratios  are  shown,  followed  by  actual  dN  and  dS  values  calculated  using  Mega  4  (Tamura  et  al.,  2007).  “#”  indicates  uncomputed  values  due  to  synonymous  values 
approaching  saturation.  Gaps,  start  and  stop  codons,  and  split  codons  at  exon  boundaries  were  removed  for  this  analysis. 


mRNA  expression,  not  the  expression  of  a  functional  protein; 
selection-based  analysis,  while  indirect,  assumes  functional  protein 
expression.  Our  study  suggests  a  limit  for  the  applicability  of  this 
approach  in  evolutionary  distance  (time)  between  compared 
species  (Fig.  IB):  comparisons  between  fly  species  (65-43  Mya)  and 
between  mosquito  species  (200-245  Mya)  generated  values,  but 
many  comparisons  between  flies  and  mosquitoes  (-240  Mya) 
failed  due  to  saturation  of  changes  within  sites  (the  formulae  for  dS 
and  dN  approaches  infinity  as  the  number  of  changes  approaches 
saturation).  Thus  the  analysis  may  be  limited  to  comparing  species 
which  diverged  within  the  past  200-250  million  years. 

Members  of  the  SNMP1  and  SNMP2  sub-clades  of  the  insect 
CD36  gene  family  appear  to  be  represented  at  least  throughout  the 
holometabolous  lineage.  We  hope  this  study  will  provide  the 
community  with  information  that  will  encourage  further  study  of 
these  genes  in  a  broad  range  of  species.  SNMPs  appear  to  have 
important  functions  in  chemoreception;  comparative  analysis 
should  significantly  contribute  to  clarifying  those  functions. 
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